{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<h1> Hyper-parameter tuning </h1>\n",
    "\n",
    "In this notebook, you will learn how to carry out hyper-parameter tuning.\n",
    "\n",
    "This notebook takes several hours to run."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<h2> Environment variables for project and bucket </h2>\n",
    "\n",
    "Change the cell below to reflect your Project ID and bucket name. See Lab 3a for setup instructions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "PROJECT = 'cloud-training-demos'    # CHANGE THIS\n",
    "REGION = 'us-central1' # Choose an available region for Cloud MLE from https://cloud.google.com/ml-engine/docs/regions.\n",
    "BUCKET = 'cloud-training-demos-ml' # REPLACE WITH YOUR BUCKET NAME. Use a regional bucket in the region you selected."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "# for bash\n",
    "os.environ['PROJECT'] = PROJECT\n",
    "os.environ['BUCKET'] = BUCKET\n",
    "os.environ['REGION'] = REGION"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "gcloud config set project $PROJECT\n",
    "gcloud config set compute/region $REGION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<h1> 1. Command-line parameters to task.py </h1>\n",
    "\n",
    "Note the command-line parameters to task.py.  These are the things that could be hypertuned if we wanted."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "!grep -A 2 add_argument taxifare/trainer/task.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<h1> 2. Evaluation metric </h1>\n",
    "\n",
    "We add a special evaluation metric. It could be any objective function we want."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "!grep -A 5 add_eval_metrics taxifare/trainer/model.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<h1> 3. Make sure outputs do not clobber each other </h1>\n",
    "\n",
    "We append the trial-number to the output directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "!grep -A 5 \"trial\" taxifare/trainer/task.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<h1> 4. Create hyper-parameter configuration </h1>\n",
    "\n",
    "The file specifies the search region in parameter space.  Cloud MLE carries out a smart search algorithm within these constraints (i.e. it does not try out every single value)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "%%writefile hyperparam.yaml\n",
    "trainingInput:\n",
    "  scaleTier: STANDARD_1\n",
    "  hyperparameters:\n",
    "    goal: MINIMIZE\n",
    "    maxTrials: 30\n",
    "    maxParallelTrials: 3\n",
    "    hyperparameterMetricTag: rmse\n",
    "    params:\n",
    "    - parameterName: train_batch_size\n",
    "      type: INTEGER\n",
    "      minValue: 64\n",
    "      maxValue: 512\n",
    "      scaleType: UNIT_LOG_SCALE\n",
    "    - parameterName: nbuckets\n",
    "      type: INTEGER\n",
    "      minValue: 10\n",
    "      maxValue: 20\n",
    "      scaleType: UNIT_LINEAR_SCALE\n",
    "    - parameterName: hidden_units\n",
    "      type: CATEGORICAL\n",
    "      categoricalValues: [\"128 32\", \"256 128 16\", \"64 64 64 8\"]       "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<h1> 5. Run the training job </h1>\n",
    "\n",
    "Just --config to the usual training command."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "%%bash\n",
    "OUTDIR=gs://${BUCKET}/taxifare/ch4/taxi_trained\n",
    "JOBNAME=lab4a_$(date -u +%y%m%d_%H%M%S)\n",
    "echo $OUTDIR $REGION $JOBNAME\n",
    "gcloud storage rm --recursive --continue-on-error $OUTDIR\n",
    "gcloud ml-engine jobs submit training $JOBNAME \\\n",
    "   --region=$REGION \\\n",
    "   --module-name=trainer.task \\\n",
    "   --package-path=${PWD}/taxifare/trainer \\\n",
    "   --job-dir=$OUTDIR \\\n",
    "   --staging-bucket=gs://$BUCKET \\\n",
    "   --scale-tier=STANDARD_1 \\\n",
    "   --runtime-version=1.4 \\\n",
    "   --config=hyperparam.yaml \\\n",
    "   -- \\\n",
    "   --train_data_paths=\"gs://$BUCKET/taxifare/ch4/taxi_preproc/train*\" \\\n",
    "   --eval_data_paths=\"gs://${BUCKET}/taxifare/ch4/taxi_preproc/valid*\"  \\\n",
    "   --output_dir=$OUTDIR \\\n",
    "   --train_steps=5000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "<h2>6. Train chosen model on full dataset</h2>\n",
    "\n",
    "Look at the last section of the <a href=\"feateng.ipynb\">feature engineering notebook</a>.  The extra parameters are based on hyper-parameter tuning."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Copyright 2016 Google Inc. Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.14"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
